Unsupervised Feature Selection for the $k$-means Clustering Problem
نویسندگان
چکیده
We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ε ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ε)/ε) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1+ (1+ ε)γ)-approximate partition with high probability.
منابع مشابه
A New Hybrid Feature Selection using Natural Language Processing for Text Clustering
Text clustering is unsupervised machine learning method.It needs representation of objects and similarity measure. which compares distribution of features between objects. For the high dimensionality of feature space performance of clustering algorithms decreases.Two techniques are used to deal with this problem: feature extraction and feature selection.In this paper, we describe the hybrid met...
متن کاملA hybrid DEA-based K-means and invasive weed optimization for facility location problem
In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...
متن کاملResearch of Feature Selection for Text Clustering Based on Cloud Model
Text clustering belongs to the unsupervised machine learning, the discriminability of class attributes cannot be measured in clustering. And the traditional text feature selection methods cannot effectively solve the high-dimensional problem. To overcome the weakness in existing feature selection, this paper proposes a new method which introduces the cloud model theory into feature selection, c...
متن کاملHybrid Active Feature Selection For Text Classification
Clustering is the most common form of unsupervised learning.In clustering, it is the distribution and makeup of the data that will determine cluster membership. It needs representation of objects and similarity measure. which compares distribution of features between objects. For the high dimensionality, feature extraction and feature selection improves the performance of clustering algorithms....
متن کاملClustering with feature selection using alternating minimization, Application to computational biology
This paper deals with unsupervised clustering with feature selection. The problem is to estimate both labels and a sparse projection matrix of weights. To address this combinatorial non-convex problem maintaining a strict control on the sparsity of the matrix of weights, we propose an alternating minimization of the Frobenius norm criterion. We provide a new efficient algorithm named K-sparse w...
متن کامل